Part 1
This document shows an exemplary downstream analysis on immune gene
expression using the SingleCellAlleleExperiment (SCAE)
package. The used dataset is under controlled access and not for public
usage. The here shown results are part of the masters thesis:
Development of a multi-layer R object to study immune gene expression in spatial (and single-cell) transcriptomics at allele, gene and functional level written by Jonas Schuck.
The project was conducted in the research group ‘Computational Immunology’ lead by Dr. Katharina Imkeller at the Edinger Institute at the University Hospital Frankfurt. Dr. Imkeller provided supervision of the project. Primary corrector is Prof. Dr. Florian Buettner at Goethe University Frankfurt.
This document gives only a brief introduction to the
SingleCellAlleleExperiment (SCAE) package as well as an
overview for the performed steps described in the
package application section of the thesis. For further
information conduct the thesis.
The SingleCellAlleleExperiment (SCAE) class is a
container for storing and handling allele-aware quantification data for
immune genes. The SCAE class is derived from the SingleCellExperiment(SCE)
class uses the same object architecture. However, multiple data layers
are integrated into the object during object generation. Data from the
allele-aware quantification method contains information about alleles
for immune genes. During object generation the allele information is
aggregated into two additional data layers via an ontology based MHC
design principle and extended to the inital raw data using a lookup
table. Thus the final SCAE object contains non-immune genes, alleles for
immune genes, an immune gene layer and a functional class layer (refer
to Figure 1).
For example the counts of the allele A*01:01:01:01
present in the raw input data will be transformed into the
HLA-A immune gene layer and the HLA-class I
functional class layer. The information for the lookup table is
retrieved from the IPD-IMGT/HLA database.
The implemented object follows similar conventions like the SCE class, where rows should represent features (genes, transcripts) and columns should represent cells. Established single cell packages like scater and scran can be used with the SCAE object to perform downstream analysis on immune gene expression.
Figure 1: Scheme of SingleCellAlleleExperiment
object structure with lookup table.
The read in function of the SCAE package
readAlleleCounts expects specific files and file
identifiers. The stated input directory should contain the following
files:
The used dataset for later downstream analysis and testing of the functionalities of the multi-layer object is a dual-center, two-cohort study where whole blood and peripheral blood mononuclear cells underwent scRNA-sequencing. The whole transcriptome dataset is under controlled access and not for public use. The corresponding publication Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell Compartment can be found here.
The following package are abundant for performing the downstream analysis and visualization.
library(SingleCellAlleleExperiment)
library(scran)
library(scater)
library(gridExtra)
library(tidyverse)
library(patchwork)
library(cowplot)
library(ggplot2)
First the directory path is stated where the input data is saved.
dir_path <- "./input_files/counts_unfiltered"
list.files("./input_files/counts_unfiltered")
## [1] "cells_x_genes.barcodes.txt" "cells_x_genes.genes.txt"
## [3] "cells_x_genes.mtx" "lookup_table_HLA_only.csv"
Starting the workflow with assessing the quality of the underlying
dataset. Prior to filtering out low-quality cells the
plotKnee() function of the SCAE package can be used. The
function offers two different options for the knee plot. A default one
(plot B) and an advanced one (plot A)
with a computed knee and inflection point on the curve which pose
suggestions for potential threshold for filtering out low-quality
cells.
ega_knee_adv <- plotKnee(dir_path, "advanced")
ega_knee_def <- plotKnee(dir_path, "default")
knee <- ega_knee_adv + labs(title = "A") + theme_bw() +
ega_knee_def + labs(title = "B") + theme_bw() +
theme(axis.title.y=element_blank(), legend.position = "none")
knee
In plot A, the advanced knee-plot, the inflection
point suggests a threshold for filtering out low-quality cells at about
110 total UMI counts. This threshold is passed to the
filter_threshold parameter of the
readAlleleCounts() function. The dataset used is a whole
transcriptome dataset, which is why exp_type parameter is
set to WTA. The alternative would be to state
Amplicon. The symbols parameter is for
retrieving NCBI gene names for the ensemble gene names present in the
raw data with the help of the biomaRt
package. This parameter is not abundant to state at function execution
as the standard value is biomart. For offline execution,
orgdb is suggested, which uses the org.Hs.eg.db
package.
SCAE objects pose the same object architecture as SCE objects.
scae <- readAlleleCounts(dir_path,
sample.names = "EGA_WTA",
filter_threshold = 110,
exp_type = "WTA",
symbols = "biomart")
## [1] "Runtime check (1/2) Read_in: 7.39461421966553"
## [1] " Generating SCAE (1/5) extending rowData: 12.8516063690186"
## [1] " Generating SCAE (2/5) filtering and normalization: 0.339311838150024"
## [1] " Generating SCAE (3/5) alleles2genes: 2.75878572463989"
## [1] " Generating SCAE (4/5) genes2functional: 1.73276138305664"
## [1] " Generating SCAE (5/5) log_transform: 1.73276138305664"
## [1] "Runtime check (2/2) Generating SCAE completed: 19.5463762283325"
## [1] "Total runtime, completed read_in, filtering and normalization and generating scae object 26.9412634372711"
scae
## class: SingleCellAlleleExperiment
## dim: 62718 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(62718): ENSG00000160072.20 ENSG00000279928.2 ... HLA_class_I
## HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
Two new classification columns are introduced in the
rowData slot. Namely the NI_I column and
Quant_type columns. Both columns are used to identify each
row of the object to its corresponding data layer.
rowData(scae)
## DataFrame with 62718 rows and 4 columns
## Ensembl.ID Symbol NI_I Quant_type
## <character> <character> <character> <character>
## ENSG00000160072.20 ENSG00000160072.20 ATAD3B NI G
## ENSG00000279928.2 ENSG00000279928.2 DDX11L17 NI G
## ENSG00000228037.1 ENSG00000228037.1 NA NI G
## ENSG00000142611.17 ENSG00000142611.17 PRDM16 NI G
## ENSG00000284616.1 ENSG00000284616.1 NA NI G
## ... ... ... ... ...
## HLA-DQA1 HLA-DQA1 HLA-DQA1 I G
## HLA-DQB1 HLA-DQB1 HLA-DQB1 I G
## HLA-DPB1 HLA-DPB1 HLA-DPB1 I G
## HLA_class_I HLA_class_I HLA_class_I I F
## HLA_class_II HLA_class_II HLA_class_II I F
As the object extends the count matrix during the object generation, its abundant to compute scaling factors on the raw data prior to extending and integrating the data layers. The scaling factors are used for scaling normalization in a later step of the SCAE constructor.
colData(scae)
## DataFrame with 47788 rows and 3 columns
## Sample Barcode sizeFactor
## <character> <character> <numeric>
## AAATCCTGTAAACGTACCAAAGTCATT EGA_WTA AAATCCTGTAAACGTACCAA.. 0.361248
## AAATCCTGTAAACGTACCACGTGGGAG EGA_WTA AAATCCTGTAAACGTACCAC.. 0.188150
## AAATCCTGTAAACGTACCTCTCACGGA EGA_WTA AAATCCTGTAAACGTACCTC.. 0.457581
## AAATCCTGTAAACGTACCTGCATAGTA EGA_WTA AAATCCTGTAAACGTACCTG.. 0.871510
## AAATCCTGTAAACTGCGCAACAACGCG EGA_WTA AAATCCTGTAAACTGCGCAA.. 1.846879
## ... ... ... ...
## TTGTTCCAATTGGTATGAATCATTCTG EGA_WTA TTGTTCCAATTGGTATGAAT.. 0.931718
## TTGTTCCAATTGGTATGAATGGGACTC EGA_WTA TTGTTCCAATTGGTATGAAT.. 2.170497
## TTGTTCCAATTGGTATGACATAACGTT EGA_WTA TTGTTCCAATTGGTATGACA.. 1.467569
## TTGTTCCAATTGGTATGACATAGGTCA EGA_WTA TTGTTCCAATTGGTATGACA.. 0.704433
## TTGTTCCAATTGGTATGAGCAACATTA EGA_WTA TTGTTCCAATTGGTATGAGC.. 2.126846
Additionally to the established getters from the SCE
package, new getters are implemented to retrieve the different data
layers integrated in the SCAE object.
get_nigenes(scae)
## class: SingleCellAlleleExperiment
## dim: 62695 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(62695): ENSG00000160072.20 ENSG00000279928.2 ...
## ENSG00000277475.1 ENSG00000275405.1
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_nigenes(scae)))
## [1] "ENSG00000160072.20" "ENSG00000279928.2" "ENSG00000228037.1"
## [4] "ENSG00000142611.17" "ENSG00000284616.1" "ENSG00000157911.11"
#### Alleles
get_alleles(scae)
## class: SingleCellAlleleExperiment
## dim: 14 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(14): A*02:01:01:01 A*26:01:01:01 ... DPB1*10:01:01:01
## DPB1*13:01:01:01
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_alleles(scae)))
## [1] "A*02:01:01:01" "A*26:01:01:01" "B*51:01:01:01" "B*57:01:01:01"
## [5] "C*04:01:01:01" "C*06:02:01:01"
#### Immune genes
get_agenes(scae)
## class: SingleCellAlleleExperiment
## dim: 7 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(7): HLA-A HLA-B ... HLA-DQB1 HLA-DPB1
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_agenes(scae)))
## [1] "HLA-A" "HLA-B" "HLA-C" "HLA-DRB1" "HLA-DQA1" "HLA-DQB1"
#### Functional class
get_func(scae)
## class: SingleCellAlleleExperiment
## dim: 2 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(2): HLA_class_I HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_func(scae)))
## [1] "HLA_class_I" "HLA_class_II"
#### Unknown alleles
In case allele information that is stated in the raw data is not
found in the lookup table, this marks an potentially invalid
allele-identifier. These alleles will be classified accordingly in the
Quant_type classification column in the
colData slot and can be retrieved with the
get_unknown() getter.
get_unknown(scae)
## class: SingleCellAlleleExperiment
## dim: 0 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(0):
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_unknown(scae)))
## character(0)
Checking the expression for the allele-layer, immune gene layer and
functional class layer. Allele identifiers are in the form of
A*02:01:01:01. The immune genes are in the form of
HLA-A and the functional classes HLA_class_I.
Here we see that HLA-class I and HLA-C are the most abundant functional
class and immune gene respectively in the underlying dataset.
all_alleles <- c(rownames(get_alleles(scae)),
rownames(get_agenes(scae)),
rownames(get_func(scae)))
plotExpression(scae, all_alleles)
# Downstream analysis
In the following sections, main steps for dimensional reduction are performed, offering insights into the different data layers of the SCAE object.
Do we need to do this?????
scae <- scae[rowSums(counts(scae)) > 0, ]
scae
## class: SingleCellAlleleExperiment
## dim: 29923 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(29923): ENSG00000160072.20 ENSG00000228037.1 ... HLA_class_I
## HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
The non-imune genes are combined with each of the integrated immune gene allele-aware layers to determine three different subsets.
redim_scae <- scae
ni_a <- redim_scae[c(rownames(get_nigenes(redim_scae)), rownames(get_alleles(redim_scae))), ]
ni_a
## class: SingleCellAlleleExperiment
## dim: 29914 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(29914): ENSG00000160072.20 ENSG00000228037.1 ...
## DPB1*10:01:01:01 DPB1*13:01:01:01
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
ni_g <- redim_scae[c(rownames(get_nigenes(redim_scae)), rownames(get_agenes(redim_scae))), ]
ni_g
## class: SingleCellAlleleExperiment
## dim: 29907 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(29907): ENSG00000160072.20 ENSG00000228037.1 ... HLA-DQB1
## HLA-DPB1
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
ni_f <- redim_scae[c(rownames(get_nigenes(redim_scae)), rownames(get_func(redim_scae))), ]
ni_f
## class: SingleCellAlleleExperiment
## dim: 29902 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(29902): ENSG00000160072.20 ENSG00000228037.1 ... HLA_class_I
## HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
Using the modelGeneVar() function prior to performing
HVGs using getTopHVGs. Both functions are part from the
scran
package.
df_ni_a <- modelGeneVar(ni_a)
head(df_ni_a)
## DataFrame with 6 rows and 6 columns
## mean total tech bio p.value
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000160072.20 0.002933487 0.003856521 0.003383365 4.73157e-04 1.75665e-02
## ENSG00000228037.1 0.000567862 0.000769625 0.000654949 1.14677e-04 4.17255e-03
## ENSG00000142611.17 0.000240115 0.000336065 0.000276939 5.91253e-05 6.49211e-04
## ENSG00000157911.11 0.001207261 0.001281901 0.001392406 -1.10505e-04 8.84075e-01
## ENSG00000269896.2 0.000138593 0.000201914 0.000159847 4.20670e-05 3.67483e-05
## ENSG00000228463.10 0.001309151 0.001739702 0.001509922 2.29781e-04 1.09352e-02
## FDR
## <numeric>
## ENSG00000160072.20 0.061996901
## ENSG00000228037.1 0.017757513
## ENSG00000142611.17 0.003390449
## ENSG00000157911.11 1.000000000
## ENSG00000269896.2 0.000244906
## ENSG00000228463.10 0.041368136
df_ni_g <- modelGeneVar(ni_g)
head(df_ni_g)
## DataFrame with 6 rows and 6 columns
## mean total tech bio p.value
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000160072.20 0.002933487 0.003856521 0.003375378 4.81143e-04 6.77935e-03
## ENSG00000228037.1 0.000567862 0.000769625 0.000653403 1.16223e-04 1.03295e-03
## ENSG00000142611.17 0.000240115 0.000336065 0.000276286 5.97791e-05 8.93833e-05
## ENSG00000157911.11 0.001207261 0.001281901 0.001389119 -1.07218e-04 9.09350e-01
## ENSG00000269896.2 0.000138593 0.000201914 0.000159470 4.24443e-05 2.01703e-06
## ENSG00000228463.10 0.001309151 0.001739702 0.001506358 2.33345e-04 3.65000e-03
## FDR
## <numeric>
## ENSG00000160072.20 2.39261e-02
## ENSG00000228037.1 4.39626e-03
## ENSG00000142611.17 4.66851e-04
## ENSG00000157911.11 1.00000e+00
## ENSG00000269896.2 1.34456e-05
## ENSG00000228463.10 1.38086e-02
df_ni_f <- modelGeneVar(ni_f)
head(df_ni_f)
## DataFrame with 6 rows and 6 columns
## mean total tech bio p.value
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000160072.20 0.002933487 0.003856521 0.003387065 4.69457e-04 1.10875e-02
## ENSG00000228037.1 0.000567862 0.000769625 0.000655665 1.13960e-04 2.06304e-03
## ENSG00000142611.17 0.000240115 0.000336065 0.000277242 5.88225e-05 2.31380e-04
## ENSG00000157911.11 0.001207261 0.001281901 0.001393929 -1.12028e-04 9.07633e-01
## ENSG00000269896.2 0.000138593 0.000201914 0.000160022 4.18921e-05 7.79048e-06
## ENSG00000228463.10 0.001309151 0.001739702 0.001511573 2.28129e-04 6.37534e-03
## FDR
## <numeric>
## ENSG00000160072.20 3.91194e-02
## ENSG00000228037.1 8.77760e-03
## ENSG00000142611.17 1.20830e-03
## ENSG00000157911.11 1.00000e+00
## ENSG00000269896.2 5.19107e-05
## ENSG00000228463.10 2.41147e-02
Plotting the variance for each data-layer.
#plot variance
par(mfrow=c(1,3))
fit_ni_a <- metadata(df_ni_a)
plot(fit_ni_a$mean, fit_ni_a$var,
xlab = "mean of log-expression", ylab = "variance of log-expression", main = "NI-A")
points(fit_ni_a$mean, fit_ni_a$var, col = "red", pch = 16)
curve(fit_ni_a$trend(x), col = "dodgerblue", add = TRUE, lwd = 2)
fit_ni_g <- metadata(df_ni_g)
plot(fit_ni_g$mean, fit_ni_g$var,
xlab = "mean of log-expression", ylab = "variance of log-expression", main = "NI-G")
points(fit_ni_g$mean, fit_ni_g$var, col = "red", pch = 16)
curve(fit_ni_g$trend(x), col = "dodgerblue", add = TRUE, lwd = 2)
fit_ni_f <- metadata(df_ni_f)
plot(fit_ni_f$mean, fit_ni_f$var,
xlab = "mean of log-expression", ylab = "variance of log-expression", main = "NI-F")
points(fit_ni_f$mean, fit_ni_f$var, col = "red", pch = 16)
curve(fit_ni_f$trend(x), col = "dodgerblue", add = TRUE, lwd = 2)
par(mfrow=c(1,1))
Compute a list of HVGs for each data layer. Return the top 0.1 % HVGs
per layer using getTopHVGs.
top_ni_a <- getTopHVGs(df_ni_a, prop = 0.1)
length(top_ni_a)
## [1] 1726
head(top_ni_a, n = 15)
## [1] "ENSG00000244734.4" "ENSG00000188536.13" "ENSG00000206172.8"
## [4] "ENSG00000143546.10" "ENSG00000147454.14" "ENSG00000115523.17"
## [7] "ENSG00000090382.7" "ENSG00000197746.15" "ENSG00000163220.11"
## [10] "ENSG00000162722.9" "C*04:01:01:01" "C*06:02:01:01"
## [13] "ENSG00000019582.17" "ENSG00000223609.11" "ENSG00000087086.15"
top_ni_g <- getTopHVGs(df_ni_g, prop = 0.1)
length(top_ni_g)
## [1] 1740
head(top_ni_g, n = 15)
## [1] "ENSG00000244734.4" "ENSG00000188536.13" "ENSG00000206172.8"
## [4] "ENSG00000143546.10" "ENSG00000147454.14" "ENSG00000115523.17"
## [7] "ENSG00000090382.7" "ENSG00000197746.15" "ENSG00000163220.11"
## [10] "ENSG00000162722.9" "ENSG00000019582.17" "ENSG00000087086.15"
## [13] "ENSG00000223609.11" "ENSG00000158869.11" "ENSG00000133742.14"
top_ni_f <- getTopHVGs(df_ni_f, prop = 0.1)
length(top_ni_f)
## [1] 1718
head(top_ni_f, n = 15)
## [1] "ENSG00000244734.4" "ENSG00000188536.13" "ENSG00000206172.8"
## [4] "ENSG00000143546.10" "ENSG00000147454.14" "ENSG00000115523.17"
## [7] "ENSG00000090382.7" "ENSG00000197746.15" "ENSG00000163220.11"
## [10] "ENSG00000162722.9" "ENSG00000019582.17" "ENSG00000087086.15"
## [13] "ENSG00000223609.11" "ENSG00000158869.11" "ENSG00000133742.14"
Compute PCA for each layer and store the results in the object. Its
Important to make unique identifiers for each layer/run or the results
will be overwritten and just saved as PCA. Here, the
runPCA functions from the scater
package is used.
set.seed(18)
redim_scae <- runPCA(redim_scae, ncomponents = 10, subset_row = top_ni_a, exprs_values = "logcounts", name = "PCA_a")
redim_scae <- runPCA(redim_scae, ncomponents = 10, subset_row = top_ni_g, exprs_values = "logcounts", name = "PCA_g")
redim_scae <- runPCA(redim_scae, ncomponents = 10, subset_row = top_ni_f, exprs_values = "logcounts", name = "PCA_f")
reducedDimNames(redim_scae)
## [1] "PCA_a" "PCA_g" "PCA_f"
The same goes for running t-SNE with the runTSNE
function from the scater
package. Unique identifiers are stated here for each layer as well.
set.seed(18)
redim_scae <- runTSNE(redim_scae, dimred= "PCA_a", perplexity = 50, name = "TSNE_a_50")
set.seed(18)
redim_scae <- runTSNE(redim_scae, dimred= "PCA_g", perplexity = 50, name = "TSNE_g_50")
set.seed(18)
redim_scae <- runTSNE(redim_scae, dimred= "PCA_f", perplexity = 50, name = "TSNE_f_50")
List of results from the performed reduced dimension analysis.
reducedDimNames(redim_scae)
## [1] "PCA_a" "PCA_g" "PCA_f" "TSNE_a_50" "TSNE_g_50" "TSNE_f_50"
saveRDS(redim_scae, file = "example_workflow_t_110_pca_tsne_perplex.rds")
redim_scae <- readRDS(file = "~/Work/R/Masterthesis/dev/SCAE/example_workflow_t_110_pca_tsne_perplex.rds")
Exemplary visualization for the t-SNE results on gene level for
immune genes that relate to HLA-class I. In the given datasets, these
are the immune genes HLA-A, HLA-B and
HLA-C plotted alongside their alleles. This allows for
insigts into potential genetic differences shown on allele-level.
which_tsne <- "TSNE_g_50"
tsne_g_a <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-A")
tsne_g_a1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "A*02:01:01:01")
tsne_g_a2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "A*26:01:01:01")
p2 <- tsne_g_a + tsne_g_a1 + tsne_g_a2
p2
tsne_g_b <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-B")
tsne_g_b1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "B*51:01:01:01")
tsne_g_b2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "B*57:01:01:01")
p3 <- tsne_g_b + tsne_g_b1 + tsne_g_b2
p3
which_tsne <- "TSNE_g_50"
tsne_g_c <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-C")
tsne_g_c1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*04:01:01:01")
tsne_g_c2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*06:02:01:01")
p1 <- tsne_g_c + tsne_g_c1 + tsne_g_c2
p1
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] cowplot_1.1.1 patchwork_1.1.3
## [3] lubridate_1.9.2 forcats_1.0.0
## [5] stringr_1.5.0 dplyr_1.1.2
## [7] purrr_1.0.2 readr_2.1.4
## [9] tidyr_1.3.0 tibble_3.2.1
## [11] tidyverse_2.0.0 gridExtra_2.3
## [13] scater_1.22.0 ggplot2_3.4.3
## [15] scran_1.22.1 scuttle_1.4.0
## [17] SingleCellAlleleExperiment_0.0.0.9000 SingleCellExperiment_1.16.0
## [19] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [21] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
## [23] IRanges_2.28.0 S4Vectors_0.32.4
## [25] BiocGenerics_0.40.0 MatrixGenerics_1.6.0
## [27] matrixStats_1.0.0 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.16 ggbeeswarm_0.7.2
## [3] colorspace_2.1-0 bluster_1.4.0
## [5] XVector_0.34.0 BiocNeighbors_1.12.0
## [7] rstudioapi_0.15.0 farver_2.1.1
## [9] ggrepel_0.9.3 bit64_4.0.5
## [11] AnnotationDbi_1.56.2 fansi_1.0.4
## [13] xml2_1.3.5 R.methodsS3_1.8.2
## [15] sparseMatrixStats_1.6.0 cachem_1.0.8
## [17] knitr_1.43 jsonlite_1.8.7
## [19] cluster_2.1.4 dbplyr_2.3.3
## [21] R.oo_1.25.0 png_0.1-8
## [23] HDF5Array_1.22.1 BiocManager_1.30.22
## [25] compiler_4.1.2 httr_1.4.6
## [27] dqrng_0.3.0 Matrix_1.6-1
## [29] fastmap_1.1.1 limma_3.50.3
## [31] cli_3.6.1 BiocSingular_1.10.0
## [33] htmltools_0.5.6 prettyunits_1.1.1
## [35] tools_4.1.2 rsvd_1.0.5
## [37] igraph_1.5.1 gtable_0.3.3
## [39] glue_1.6.2 GenomeInfoDbData_1.2.7
## [41] rappdirs_0.3.3 Rcpp_1.0.11
## [43] jquerylib_0.1.4 rhdf5filters_1.6.0
## [45] vctrs_0.6.3 Biostrings_2.62.0
## [47] DelayedMatrixStats_1.16.0 xfun_0.40
## [49] beachmat_2.10.0 timechange_0.2.0
## [51] lifecycle_1.0.3 irlba_2.3.5.1
## [53] statmod_1.5.0 XML_3.99-0.14
## [55] org.Hs.eg.db_3.14.0 edgeR_3.36.0
## [57] zlibbioc_1.40.0 scales_1.2.1
## [59] hms_1.1.3 parallel_4.1.2
## [61] rhdf5_2.38.1 yaml_2.3.7
## [63] curl_5.0.2 memoise_2.0.1
## [65] sass_0.4.7 biomaRt_2.50.3
## [67] stringi_1.7.12 RSQLite_2.3.1
## [69] highr_0.10 ScaledMatrix_1.2.0
## [71] filelock_1.0.2 BiocParallel_1.28.3
## [73] rlang_1.1.1 pkgconfig_2.0.3
## [75] bitops_1.0-7 evaluate_0.21
## [77] lattice_0.21-8 Rhdf5lib_1.16.0
## [79] labeling_0.4.2 bit_4.0.5
## [81] tidyselect_1.2.0 magrittr_2.0.3
## [83] R6_2.5.1 generics_0.1.3
## [85] metapod_1.2.0 DelayedArray_0.20.0
## [87] DBI_1.1.3 pillar_1.9.0
## [89] withr_2.5.0 KEGGREST_1.34.0
## [91] RCurl_1.98-1.12 crayon_1.5.2
## [93] DropletUtils_1.14.2 utf8_1.2.3
## [95] BiocFileCache_2.2.1 tzdb_0.4.0
## [97] rmarkdown_2.24 viridis_0.6.4
## [99] progress_1.2.2 locfit_1.5-9.8
## [101] grid_4.1.2 blob_1.2.4
## [103] digest_0.6.33 R.utils_2.12.2
## [105] munsell_0.5.0 beeswarm_0.4.0
## [107] viridisLite_0.4.2 vipor_0.4.5
## [109] bslib_0.5.1